Visual Language Modeling on CNN Image Representations
نویسندگان
چکیده
Measuring the naturalness of images is important to generate realistic images or to detect unnatural regions in images. Additionally, a method to measure naturalness can be complementary to Convolutional Neural Network (CNN) based features, which are known to be insensitive to the naturalness of images. However, most probabilistic image models have insufficient capability of modeling the complex and abstract naturalness that we feel because they are built directly on raw image pixels. In this work, we assume that naturalness can be measured by the predictability on high-level features during eye movement. Based on this assumption, we propose a novel method to evaluate the naturalness by building a variant of Recurrent Neural Network Language Models on pre-trained CNN representations. Our method is applied to two tasks, demonstrating that 1) using our method as a regularizer enables us to generate more understandable images from image features than existing approaches, and 2) unnaturalness maps produced by our method achieve state-of-the-art eye fixation prediction performance on two well-studied datasets.
منابع مشابه
Learning Document Image Features With SqueezeNet Convolutional Neural Network
The classification of various document images is considered an important step towards building a modern digital library or office automation system. Convolutional Neural Network (CNN) classifiers trained with backpropagation are considered to be the current state of the art model for this task. However, there are two major drawbacks for these classifiers: the huge computational power demand for...
متن کاملModeling Mid-level Visual Representations through Clustering in a Convolutional Neural Network
The nature of visual properties used in cortical perception is subject to considerable ongoing study. Features of intermediate complexity are particularly uncertain. Convolutional Neural Network (CNN) models, however, have proven to be quite effective in modeling human vision (Yamins et al., 2014) and have performed with great accuracy on image classification tasks (Krizhevsky et al., 2012). St...
متن کاملMean Box Pooling: A Rich Image Representation and Output Embedding for the Visual Madlibs Task
Question answering about real-world images is a relatively new research direction that requires a chain of machine visual perception, natural language understanding, and deductive capabilities to successfully come up with an answer on a question about visual content. In contrast to many classical Computer Vision problems such as recognition or detection, this task does not evaluate any internal...
متن کاملMeaningfulness of Religious Language in the Light of Conceptual Metaphorical Use of Image Schema: A Cognitive Semantic Approach
According to modern religious studies, religions are rooted in certain metaphorical representations, so they are metaphorical in nature. This article aims to show, first, how conceptual metaphors employ image schemas to make our language meaningful, and then to assert that image-schematic structure of religious expressions, by which religious metaphors conceptualize abstract meanings, is the ba...
متن کاملLearning Convolutional Text Representations for Visual Question Answering
Visual question answering is a recently proposed articial intelligence task that requires a deep understanding of both images and texts. In deep learning, images are typically modeled through convolutional neural networks, and texts are typically modeled through recurrent neural networks. While the requirement for modeling images is similar to traditional computer vision tasks, such as object ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- CoRR
دوره abs/1511.02872 شماره
صفحات -
تاریخ انتشار 2015